13 research outputs found

    Large expert-curated database for benchmarking document similarity detection in biomedical literature search

    Get PDF
    Document recommendation systems for locating relevant literature have mostly relied on methods developed a decade ago. This is largely due to the lack of a large offline gold-standard benchmark of relevant documents that cover a variety of research fields such that newly developed literature search techniques can be compared, improved and translated into practice. To overcome this bottleneck, we have established the RElevant LIterature SearcH consortium consisting of more than 1500 scientists from 84 countries, who have collectively annotated the relevance of over 180 000 PubMed-listed articles with regard to their respective seed (input) article/s. The majority of annotations were contributed by highly experienced, original authors of the seed articles. The collected data cover 76% of all unique PubMed Medical Subject Headings descriptors. No systematic biases were observed across different experience levels, research fields or time spent on annotations. More importantly, annotations of the same document pairs contributed by different scientists were highly concordant. We further show that the three representative baseline methods used to generate recommended articles for evaluation (Okapi Best Matching 25, Term Frequency-Inverse Document Frequency and PubMed Related Articles) had similar overall performances. Additionally, we found that these methods each tend to produce distinct collections of recommended articles, suggesting that a hybrid method may be required to completely capture all relevant articles. The established database server located at https://relishdb.ict.griffith.edu.au is freely available for the downloading of annotation data and the blind testing of new methods. We expect that this benchmark will be useful for stimulating the development of new powerful techniques for title and title/abstract-based search engines for relevant articles in biomedical research.Peer reviewe

    Finishing the euchromatic sequence of the human genome

    Get PDF
    The sequence of the human genome encodes the genetic instructions for human physiology, as well as rich information about human evolution. In 2001, the International Human Genome Sequencing Consortium reported a draft sequence of the euchromatic portion of the human genome. Since then, the international collaboration has worked to convert this draft into a genome sequence with high accuracy and nearly complete coverage. Here, we report the result of this finishing process. The current genome sequence (Build 35) contains 2.85 billion nucleotides interrupted by only 341 gaps. It covers ∼99% of the euchromatic genome and is accurate to an error rate of ∼1 event per 100,000 bases. Many of the remaining euchromatic gaps are associated with segmental duplications and will require focused work with new methods. The near-complete sequence, the first for a vertebrate, greatly improves the precision of biological analyses of the human genome including studies of gene number, birth and death. Notably, the human enome seems to encode only 20,000-25,000 protein-coding genes. The genome sequence reported here should serve as a firm foundation for biomedical research in the decades ahead

    Porcine Parvovirus 7: Evolutionary Dynamics and Identification of Epitopes toward Vaccine Design

    No full text
    Porcine parvovirus 7 (PPV7) belonging to the genus Chapparvovirus in the family Parvoviridae, has been identified in the USA, Sweden, Poland, China, South Korea and Brazil. Our objective was to determine the phylogeny, estimate the time of origin and evolutionary dynamics of PPV7, and use computer-based immune-informatics to assess potential epitopes of its Cap, the main antigenic viral protein, for vaccines or serology. Regarding evolutionary dynamics, PPV7 had 2 major clades, both of which possibly had a common ancestor in 2004. Furthermore, PPV7 strains from China were the most likely ancestral strains. The nucleotide substitution rates of NS1 and Cap genes were 8.01 × 10−4 and 2.19 × 10−3 per site per year, respectively, which were higher than those reported for PPV1-4. The antigenic profiles of PPV7 Cap were revealed and there were indications that PPV7 used antigenic shift to escape from the host’s immune surveillance. Linear B cell epitopes and CD8 T cell epitopes of Cap with good antigenic potential were identified in silico; these conserved B cell epitopes may be candidates for the PPV7 vaccine or for the development of serological diagnostic methods

    Genotype Distribution of Human Papillomavirus among Women with Cervical Cytological Abnormalities or Invasive Squamous Cell Carcinoma in a High-Incidence Area of Esophageal Carcinoma in China

    No full text
    Data of HPV genotype including 16 high-risk HPV (HR-HPV) and 4 low-risk HPV from 38,397 women with normal cytology, 1341 women with cervical cytology abnormalities, and 223 women with ISCC were retrospectively evaluated by a hospital-based study. The prevalence of high-risk HPV (HR-HPV) was 6.51%, 41.83%, and 96.86% in women with normal cytology, cervical cytology abnormalities, and ISCC, respectively. The three most common HPV types were HPV-52 (1.76%), HPV-16 (1.28%), and HPV-58 (0.97%) in women with normal cytology, whereas the most prevalent HPV type was HPV-16 (16.85%), followed by HPV-52 (9.55%) and HPV-58 (7.83%) in women with cervical cytology abnormalities. Specifically, HPV-16 had the highest frequency in ASC-H (24.16%, 36/149) and HSIL (35.71%, 110/308), while HPV-52 was the most common type in ASC-US (8.28%, 53/640) and LSIL (16.80%, 41/244). HPV-16 (75.78%), HPV18 (10.31%), and HPV58 (9.87%) were the most common types in women with ISCC. These data might contribute to increasing the knowledge of HPV epidemiology and providing the guide for vaccine selection for women in Shantou

    Immunoinformatic Analysis of T- and B-Cell Epitopes for SARS-CoV-2 Vaccine Design

    No full text
    Currently, there is limited knowledge about the immunological profiles of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2). We used computer-based immunoinformatic analysis and the newly resolved 3-dimensional (3D) structures of the SARS-CoV-2 S trimeric protein, together with analyses of the immunogenic profiles of SARS-CoV, to anticipate potential B-cell and T-cell epitopes of the SARS-CoV-2 S protein for vaccine design, particularly for peptide-driven vaccine design and serological diagnosis. Nine conserved linear B-cell epitopes and multiple discontinuous B-cell epitopes composed of 69 residues on the surface of the SARS-CoV-2 trimeric S protein were predicted to be highly antigenic. We found that the SARS-CoV-2 S protein has a different antigenic profile than that of the SARS-CoV S protein due to the variations in their primary and 3D structures. Importantly, SARS-CoV-2 may exploit an immune evasion mechanism through two point mutations in the critical and conserved linear neutralization epitope (overlap with fusion peptide) around a sparsely glycosylated area. These mutations lead to a significant decrease in the antigenicity of this epitope in the SARS-CoV-2 S protein. In addition, 62 T-cell epitopes in the SARS-CoV-2 S protein were predicted in our study. The structure-based immunoinformatic analysis for the SARS-CoV-2 S protein in this study may improve vaccine design, diagnosis, and immunotherapy against the pandemic of COVID-19

    Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through Analysis of Viral Genomics and Structure

    Full text link
    The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world and infected hundreds of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis and in understanding potential differences among variants. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease

    Pathogenesis, Symptomatology, and Transmission of SARS-CoV-2 through analysis of Viral Genomics and Structure

    No full text
    The novel coronavirus SARS-CoV-2, which emerged in late 2019, has since spread around the world infecting tens of millions of people with coronavirus disease 2019 (COVID-19). While this viral species was unknown prior to January 2020, its similarity to other coronaviruses that infect humans has allowed for rapid insight into the mechanisms that it uses to infect human hosts, as well as the ways in which the human immune system can respond. Here, we contextualize SARS-CoV-2 among other coronaviruses and identify what is known and what can be inferred about its behavior once inside a human host. Because the genomic content of coronaviruses, which specifies the virus's structure, is highly conserved, early genomic analysis provided a significant head start in predicting viral pathogenesis. The pathogenesis of the virus offers insights into symptomatology, transmission, and individual susceptibility. Additionally, prior research into interactions between the human immune system and coronaviruses has identified how these viruses can evade the immune system's protective mechanisms. We also explore systems-level research into the regulatory and proteomic effects of SARS-CoV-2 infection and the immune response. Understanding the structure and behavior of the virus serves to contextualize the many facets of the COVID-19 pandemic and can influence efforts to control the virus and treat the disease
    corecore